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60/060,638 filed October 1, 1997, the contents of which are incorporated herein by 
reference in their entirety. 

I, FIELD OF THE INVENTION 

The present invention pertains to a predictive device that models the dynamic 
input/output relationships of a physical process, particularly in the process industries such 
as hydrocarbons, polymers, pulp and paper, and utilities. The predictive device is 
primarily for multivariable process control, but is also applicable to dynamic process 
monitoring, or to provide a continuous stream of inferred measurements in place of costly 
or infrequent laboratory or analyzer measurements. 

20 II BACKGROUND OF THE INVENTION 

Most existing industrial products designed for multivariable model predictive 
control (MPC) employ linear step-response models or finite impulse response (FIR) 
models. These approaches result in over-parameterization of the models (Qin and 
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Badgwell, 1996). For example, the dynamics of a first order single input/single output 
SISO process which can be represented with only three parameters (gain, time constant 
and dead-time) in a parametric form typically require from 30 to 120 coefficients to 
describe in a step-response or FIR model. This over-parameterization problem is 
5 exacerbated for non-linear models since standard non-parametric approaches, such as 
Volterra series, lead to an exponential growth in the number of parameters to be 
identified. An altemative way to overcome these problems for non-linear systems is the 
use of parametric models such as input-output Nonlinear Auto-Regressive with 
exogenous inputs (NARX). Though NARX models are found in many case-studies, a 
10 problem with NARX models using feed forward neural networks is that they offer only 
short-term predictions (Su, et al, 1992). MPC controllers require dynamic models capable 
of providing long-term predictions. Recurrent neural networks with internal or external 

CI feedback connections provide a better solution to the long-term prediction problem, but 

iri training such models is very difficult. 

§1 15 The approach described in (Graettinger, et al, 1994) and (Zhao, et al, 1997) 

provides a partial solution to this dilemma. The process model is identified based on a set 
^ of decoupled first order dynamic filters. The use of a group of first order dynamic filters 

Ill in the input layer of the model enhances noise immunity by eliminating the output 

JCJ: interaction found in NARX models. This structure circumvents the difficulty of training a 

'^^ 20 recurrent neural network, while achieving good long-term predictions. However, using 
this structure to identify process responses that are second order or higher Can result in 
over sensitive coefficients and in undesirable interactions between the first order filters. 
In addition, this approach usually results in an oversized model structure in order to 
achieve sufficient accuracy, and the model is not capable of modeling complex dynamics 
25 such as oscillatory effects. In the single input variable case, this first order structure is a 
special case of a more general nonlinear modeling approach described (Sentoni et al., 
1996) that is proven to be able to approximate any discrete, causal, time invariant, 
nonlinear SISO process with fading memory. In this approach a Laguerre expansion 
creates a cascaded configuration of a low pass and several identical band pass first order 
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filters. One of the problems of this approach is that may it require an excessively large 
degree of expansion to obtain sufficient accuracy. Also, it has not been known until now 
how to extend this methodology in a practical way to a multi-input system. 

This invention addresses many essential issues for practical non-linear 
multivariable MPC. It provides the capabihty to accurately identify non-linear dynamic 
processes with a structure that 

• has close to minimum parameterization 

• can be practically identified with sufficient accuracy 

• makes good physical sense and allows incorporation of process knowledge 

• can be proven to identify a large class of practical processes 

• can provide the necessary information for process control 

III SUMMARY OF THE INVENTION 

The present invention is a dynamic predictive device that predicts or estimates 
values of process variables that are dynamically dependent on other measured process 
variables. This invention is especially suited to appHcation in a model predictive control 
(MPC) system. The predictive device receives input data under the control of an external 
device controller. The predictive device operates in either configuration mode or one of 
three runtime modes -prediction mode, horizon mode, or reverse horizon mode. 

The primary runtime mode is the prediction mode, hi this mode, the input data are 
such as might be received from a distributed control system (DCS) as found in a 
manufacturing process. The device controller ensures that a contiguous stream of data 
from the DCS is provided to the predictive device at a synchronous discrete base sample 
time. The device controller operates the predictive device once per base sample time and 
receives the prediction from the output of the predictive device. 

After the prediction mode output is available, the device controller can switch to 
horizon mode in the interval before the next base sample time. The predictive device can 
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be operated many times during this interval and thus the device controller can conduct a 
series of experimental scenarios in which a sequence of input data can be specified by the 
device controller. The sequence of input data can be thought of as a data path the inputs 
will follow over a forward horizon. The sequence of predictions at the output of the 
controller is a predicted output path over a prediction horizon and is passed to the device 
controller for analysis, optimization, or control. The device controller informs the 
predictive device at the start of an experimental path and synchronizes the presentation of 
the path with the operation of the device. Internally, horizon mode operates exactly the 
same way as prediction mode, except that the dynamic states are maintained separately so 
that the predictive device can resume normal prediction mode operation at the next base 
sample time. In addition, the outputs of the filter units are buffered over the course of the 
path and are used during reverse horizon operation of the device. 

The purpose of reverse horizon mode is to obtain the sensitivities of the predictive 
device to changes in an input path. Reverse horizon mode can only be set after horizon 
mode operation has occurred. The device controller first informs the predictive device the 
index of the point in the output path for which sensitivities are required. The device 
controller then synchronizes the reverse operation of the predictive device with the output 
of sensitivity data at the input paths of the device. 

In forward operation, each input is scaled and shaped by a preprocessing unit 
before being passed to a corresponding delay unit which time-aligns data to resolve dead 
time effects such as pipeUne transport delay. Modeling dead-times is an important issue 
for an MPC system. In practical MPC, prediction horizons are usually set large enough so 
that both dynamics and dead-time effects are taken into account; otherwise the optimal 
control path may be based on short term information, and the control behavior may 
become oscillatory or unstable. In the preferred embodiment, the predictive device is 
predicting a single measurement, and the dead-time units ahgn data relative to the time of 
that measurement. If predictions at several measurement points are required, then several 
predictive devices are used in parallel. During configuration mode, the dead times are 
automatically estimated using training data collected firom the plant, hi the preferred 
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embodiment the training method consists of constructing individual auto-regressive 
models between each input and the output at a variety of dead-times, and choosing the 
dead time corresponding to the best such model. As with other components of the 
invention, manual override of the automatic settings is possible and should be used if 
5 there is additional process knowledge that allows a more appropriate setting. 

Each dead time unit feeds a dynamic filter unit. The dynamic filter units are used 
to represent the dynamic information in the process. Internally the dynamic filter units 
recursively maintain a vector of states. The states derive their values from states at the 
previous time step and fi-om the current input value. This general filter type can be 
10 represented by what is known to those skilled in the art as a discrete state space equation. 
The preferred embodiment imposes a much-simplified structure on the filter unit that 
Q allows for fast computation for MFC and also allows intelligent override of the automatic 

y settings. This simplified stinctiire is composed of first and second order loosely coupled 

tl subfilters, only one of which receives direct input from the corresponding delay unit. The 

il 1 5 practical identification of this filter structure is an essential part of this invention. 

£ The outputs of the dynamic filter units are passed to a non-lmear analyzer that 

ri embodies a static mapping of the filter states to an output value. The exact nature of the 

^2 non-linear analyzer is not fimdamental to this invention. It can embody a non-linear 

Q mapping such as a Non-linear Partial Least Squares model or a Neural Network, or a 

20 hybrid combination of linear model and non-linear model The preferred embodiment 
makes use of a hybrid model. The reason for this is that a non-parametric non-linear 
model identified from dynamic data (such as a neural net) cannot, by its nature, be fiiUy 
analyzed and validated prior to use. The non-linearity of the model means that different 
dynamic responses will be seen at different operating points. If the process being modeled 
25 is truly non-linear, these dynamic responses will be an improvement over linear dynamic 
models in operating regions corresponding to the training data, but may be erroneous in 
previously unseen operating regions. When the non-linear model is used within the 
context of MPC, erroneous responses, especially those indicating persistent and invalid 
gain reversals can create instabilities in the MPC controller. With a hybrid approach, a 
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non-linear model is used to model the errors between the linear dynamic model and the 
true process. The hybrid dynamic model is a parallel combination of the linear dynamic 
model with the error correction model The dynamic response of the linear model can be 
analyzed completely prior to use, since the gains are fixed and independent of the 
operating point. The process engineer can examine and approve these gains prior to 
closing the loop on the process and is assured of responses consistent with the true 
process. However, the Unear dynamic response will be sub-optimal for truly non-linear 
processes. In online operation of the hybrid model within an MPC framework, the 
responses of the linear model and the hybrid model can be monitored independently and 
compared. In operating regions where the non-linear model shows persistently poor 
response, control can be switched, either automatically or by the operator, back to the 
safety of the linear model. 

The output of the non-linear analyzer is passed through a postprocessing unit that 
converts the internal units to engineering units. 

The importance of this invention is that its structure is shown to be able to 
approximate a large class of non-linear processes (any discrete, causal, time invariant, 
nonlinear multi-input/single output (MISO) process with fading memory), but is still 
simple enough to allow incorporation of process knowledge, is computationally fast 
enough for practical non-linear MPC, and can be configured with sufficient accuracy in a 
practical manner. 

IV BRIEF DESCRIPTION OF THE DRAWINGS 

The textual description of the present invention makes detailed reference to the 
following drawings: 

FIG, i is an overall block diagram of the invention showing both the runtime and training 
components. 

FIG, 2 shows the runtime structure of an individual preprocessing imit. 
FIG. 3 shows the runtime structure of an individual delay unit. 
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FIG. 4 shows the forward flow internal decomposition of an individual filter unit into 
cascaded subfilter units. 

FIG. 5 shows the preferred forward flow structure of a primary first order subfilter unit. 

FIG. 6 shows the preferred forward flow structure of a secondary first order subfilter unit 
5 and the preferred coupling with the previous subfilter in the cascade. 

FIG, 7 shows the preferred forward flow structure of a primary second order subfilter 
unit. 

FIG, 8 shows the preferred forward flow structure of a secondary second order subfilter 
unit and the preferred coupling with the previous subfilter in the cascade. 

10 FIG. 9 shows a typical feedforward configuration of the non-hnear analyzer. 

FIG 10 shows the reverse flow configuration of the non-linear analyzer depicted in FIG. 
9. 

FIG. 11 shows the reverse flow intemal decomposition of an individual filter unit into 
cascaded subfilter units. 
1 5 FIG. 1 2 shows a method of training an individual delay unit. 

FIG. 13 shows the fu-st order decoupled structure used at the start of each iteration of the 
preferred dynamic filter unit identification method. 

FIG. 14 shows that reverse flow of data through a matrix structure can be described 
mathematically by forward flow of data through the transpose matrix structure. 

20 V DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

FIG J is an overall block diagram of the invention and its context. An external 
device controller (50) synchronizes the flow of data to and firom the predictive device via 
the data paths (18), (14), and (64). The device controller also controls the mode of 
operation and the path stepping of the predictive device via the control path (54). The 
25 external device controller may also communicate with a DCS (10) or other data/control 
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system both for requesting data and for requesting control changes to the modeled 
process; however the exact external context and configuration of the device controller is 
beyond the scope of this appUcation. 

V.1 FORWARD RUNTIME OPERATION OF THE PREDICTION DEVICE 

The figures and equations in this detailed description refer to an index k that 

represents a data point in a sequence of data points. This index has different meanings 
depending on whether the forward operational mode of the device is prediction mode or 
horizon mode. 

In prediction mode data is provided at a regular sampling interval At to the input 
nodes (18) of the device. Data is passed in a forward direction through the device. For 
simplicity of notation, the sample point To-^kAt is denoted by the index t 

In horizon mode, a sequence of data representing a forward data path is provided 
to the inputs. This data path may represent a proposed path for manipulated variables for 
process control purposes, or may represent a holding of the inputs to constant values in 
order to determine the steady state output of the device. The starting point of this path is 
taken to be the most recent input sample provided in prediction mode. Index 0 represents 
this starting point and index k represents the I^^ data point in this path. 

V.1.1 FORWARD RUNTIME OPERATION OF A PREPROCESSING UNIT 

Each input feeds a preprocessing unit (20) which is used to convert the 
engineering units of each data value to a common normalized unit whose lower and upper 
limits are, by preference, -1 and 1 respectively, or 0 and 1 respectively. 

The preprocessing unit can also shape the data by passing it through a non-linear 
transformation. However, the preferred embodiment uses a simple scale and offset as 
shown in FIG. 2 and equation (1): 

u{k)-su^{k)'\-o (1) 
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where iiE(k) is the value of an input in engineering units, and u(k) is the preprocessed 
value in normaHzed units. The scale and offset values as stored in the configuration file 
(30 - FIG. 1) are, in general, different for each input variable, and are determined in the 
configuration mode. 

V.1.2 FORWARD RUNTIME OPERATION OF A DELAY UNIT 

Data flows from each preprocessing unit to a corresponding delay unit (22). The 
forward run- time operation of the delay unit (22) is shown in FIG 3 and equation (2). The 
output u'^(k)(304) of an individual delay unit (300) is equal to the input u(k) (302) delayed 
by d sample times. The value of d may be different for each delay unit (22) and is 
retrieved fi-om the configuration file (30 - FIG, J), This may be implemented as a shift 
register with a tap at the ct^ unit. 

u'{k) = u{k'-d) (2) 

This equation can also be written in terms of the unit delay operator q-^: 



u'ik) = q''u{k) 

V.1.3 FORWARD RUNTIME OPERATION OF THE FILTER UNITS 

Referring again to FIG. 7, each delayed input value is passed to an individual 
filter imit (24). The general internal feedforward structure of a filter unit (24) is shown in 
FIG. 4. The general feedforward structure is composed of .S" cascaded subfilters (402, 404, 

406). The first subfilter in the cascade (400) is referred to as the primary subfilter. 
Non-primary subfilters are referred to as secondary subfilters. All the subfilters are alike 
except that the primary subfilter receives no input firom another subfilter, and the final 
subfilter sends no output to another subfilter. Now the general form of the primary 
subfilter will be described in detail. 

The primary subfilter maintains a vector (412) of states x/(k) at each time t An 
internal single time step delay unit (414) feeds the vector state to a coupling unit (420) 
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and to a matrix unit (416). The matrix unit converts the delayed state vector (418) and 
feeds it to a vector addition unit (408). The input to the filter unit u^(k) is expanded and 
linearly scaled by the input coupling unit (410) to a vector of values of the same 
dimension as the state vector. The vector addition unit then combines its two input 
5 streams to produce the vector of states for the current time. The operation just described 
for the primary subfilter is conveniently described in mathematical matrix and column 
vector notation as: 

X, [k) = A,x, {k - l)^h,u'{k) (3) 

Such an equation is known, to those skilled in the art, as a linear state space equation with 
10 a single input. If no structure is imposed on Ai or bi, then further subfilters are 
^ unnecessary since the cascaded subfilter structure can subsumed into a single complicated 

Q primary subfilter. However, the preferred subfilter structures as described below, or 

similar to those described below, are essential for a practical embodiment and application 
of the invention. 

15 The subfilter coupling unit (420) determines how state values at time k-1 affect 

^ the state units in the next subfilter at time L In mathematical terms, the subfilter coupling 

HI unit uses the coupling matrix to perform a linear transformation of state vector xi(k-l) 

Q which is passed to the vector addition unit of the next subfilter. The operation of a 

^''^ secondary subfilter is conveniently described in mathematical matrix and vector notations 

20 as: 

X, W = A, X, - 1) + r,x,., {k-\)^hy {k) (4) 

In the preferred embodiment, the subfilters are all of first or second order. A first 
order subfilter maintains just one state. The preferred embodiment for a first order 
primary subfilter (500) is shown in FIG. 5, The vectorizing unit (502) and the matrix unit 
25 (504) collapse to become scaling operations so that the state vector (506) is represented 
by: 

x,{k)^^iX,{k-l)^{l-A,y{k) (5) 



m 
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The preferred embodiment for a first order secondary subfilter (600) is shown in 
FIG, 6. The secondary subfilter receives no direct input, but instead receives cascaded 
input from the previous subfilter. The preferred coupling is a loose coupling scheme 
(602) in which only the last state component of the previous subfilter contributes. Note 
5 that the previous subfilter is not required to be a first order subfilter. The state vector 
(606) is represented by: 

where the matrix unit X ^ (604) is a scalar. 

Second order subfilters maintain two states. The preferred embodiment for a 
10 second order primary subfilter (700) is shown in FIG. 7. In this figure, the state vector 
xi(k) is shown in terms of its two components Xti(k) (708) and X!2(k) (710). The 
vectorizing unit (702) creates two inputs to the vector addition unit (714), the second of 
which is fixed at zero. The delayed states (704) and (706) are fed to the matrix unit (712) 
whose outputs are also fed to the vector addition unit (712) which adds the matrix 
15 transformed states to the vectorized inputs producing the current state. Note that due to 
the (1,0) structure of the second matrix row, and the zero second component of the 
vectorizing unit component, the current second state component (710) is just equal to the 
delayed first component (704): 

{k) = cin^u - 1) + ^12^12 {k-l)-^{l-a,,-a,^y {k) 
^i2W = ^u(^-l) 

20 The preferred embodiment for a second order secondary subfilter (800) is shown 

in FIG. 8. In this figure, the state vector jc^f^ is shown in terms of its two components 
Xsj(k) (808) and Xs2(k) (810). The preferred coupling with the previous subfilter unit is a 
loose coupUng scheme (802) in which only the last state component of the previous 
subfilter contributes to the first state component of the current subfilter. Note that the 

25 previous subfilter is not required to be a first order subfilter or second order subfilter. The 
output of the coupling unit is fed to the addition unit (814). The delayed states (804) and 
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(806) are fed to the state matrix unit (812) whose outputs are also fed to the vector 
addition unit (812) which adds the state matrix transformed states to the output of the 
coupling unit, producing the current state. Note that due to the (1,0) structure of the 
second state matrix row, and the zero second row of the coupling matrix, the current 
5 second state component (810) is just equal to the delayed first component (804): 

If the device is operating in horizon mode current states along the path are 
maintained in a separate storage area so as not to corrupt the prediction mode states. In 
horizon mode, k indexes the input path and the states are initialized at the start of the path 
10 (k =0) to the prediction mode states. In addition the states at the output of the filter unit 
are buffered for use in reverse horizon mode. 



ifl V.14 FORWARD RUNTIME OPERATION OF THE NON-LINEAR ANALYZER 

^ Referring again to FIG, /, the outputs (28) of the filter units (24) provide input to 

the non-linear analyzer (26). The exact structure and configuration of the non-Unear 
15 analyzer (26) is not central to this application. It is the interaction of the non-linear 
analyzer (26) with the filter units (24), and the operation and configuration of the filter 
Q units (24) that forms the core of this invention. The preferred embodiment, for reasons 

discussed in the summary of the invention is a hybrid parallel combination of linear and 
non-linear. However, for clarity of explanation, a standard neural network structure is 
20 described which is well known to those skilled in the art. This structure is shown in FIG 
P. The equations for this structure are: 

;7,W=tanhfe,W) (9) 
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V.1.5 FORWARD RUNTIME OPERATION OF THE POSTPROCESSING UNIT 

The postprocessing unit (32) in FIG. 1 is used to scale the output from the 
normalized units to engineering units. The postprocessing unit can also shape the data by 
passing it through a non-linear transformation. However, the preferred embodiment uses a 
5 simple scale and offset. For consistency with the preprocessing units, the scale and offset 
represent the mapping from engineering units to normalized units. 

yAk) = -y(k)-- (10) 

S S 

The scale and offset values as stored in the configuration file (30 - FIG, 1) and are 
determined in the configuration mode. 

10 V.2 REVERSE RUNTIME OPERATION OF THE PREDICTION DEVICE 

The reverse horizon mode of operation is only allowed immediately following 
III horizon mode operation. Horizon mode operation buffers the states (28) output by the 

filter units (24) over the course of the forward path. The purpose oi reverse horizon mode 
is to obtain the sensitivity of any ^oini y(k) of the prediction path (output by the device in 
1 5 horizon mode) with respect to any point in the input path u(l). 

In order to use the invention for process control applications, the mathematical 
derivatives of the prediction with respect to the inputs are required. The mathematical 
derivatives measure how sensitive a state is in response to a small change in an input. The 
dynamic nature of the predictive device means that a change in input at time k will start to 
20 have an effect on the output as soon as the minimum dead-time has passed and will 

continue to have an effect infinitely into the future. In most practical appUcations systems 
are identified to have fading memory so that the effect into the fixture recedes with time. 
For MPC applications the aim is to plan a sequence of moves for the inputs 
corresponding to manipulated variables (MVs). The effect of these moves needs to be 
25 predicted on the controlled variables (CVs) along a prediction path. A constrained 

optimization algorithm is then used to find the move sequences that predict an optimal 
prediction path according to some desired criteria. 
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In reverse horizon mode, the external device controller specifies the output path 
index k. The device then outputs in sequence the sensitivities (64) in reverse order at the 
input nodes of the device. In the detailed description below, the sensitivity of the output 
y^ijc) of ttie device with respect to any variable v is represented by Q.^v .It is this 
5 sensitivity value, rather than an external data value that is fed back through the device 
when operating in reverse horizon mode. 

V.2.1 REVERSE RUNTIME OPERATION OF THE POSTPROCESSING UNIT 

The reverse operation of the postprocessing unit (32) is to scale data received at 
its output node using the inverse of the feedforward scaling shown in equation (10): 

10 ^Ak) = s^kyE{k) (11) 

Since the sensitivity of the output with respect to itself is: 

€l,yM = ^ (12) 
the postprocessing unit always receives the value of 1 at its output node in reverse 

m 

s operation. 

m. 15 V.2.2 REVERSE RUNTIME OPERATION OF THE NON-LINEAR ANALYZER 
J5 The reverse runtime operation of a neural net model is well known to those skilled 

in the art and is shown in FIG. 10. The output from the reverse operation of the 
postprocessing unit Q.i,y{k) is presented at the output node of the non-linear analyzer 
(26). The information flows in a reverse manner through the non-linear analyzer (26) and 
20 the resultmg sensitivities (62) are output at the input nodes of the non-linear analyzer 
(26): 

h=l 
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V.2.3 REVERSE RUNTIME OPERATION OF A FILTER UNIT 

The effect of a change in the delayed input u^(l) on a the sequence of states being 
output from a filter unit (24) in horizon mode is complex due to the dependencies of a 
subfilter's states based on the previous subfilter's states and on the subfilter's previous 
states. An efficient solution can be derived using the chain rule for ordered derivatives 
(Werbos, 1994) and is achieved by the reverse operation of the filter unit (24). In reverse 
horizon mode, the output of each filter unit (24) receives the vector of sensitivities 
Q^x^ (a) propagated back from the non-linear analyzer (26) operating in reverse mode: 



^ A[ (Q, X, (/ + 1)) + (Q,x,,, (/ + 1)) / < ^,1 < 5 < 5 

Al{Q,Xs{l + i)) l<k,s = S 

0 l>k 

(14) 



The operation of these equations is shown in FIG. II, which shows the filter 
structure of FIG. 4, but with data flowing in the reverse direction. Given the point k in the 
output path for which the sensitivities are being calculated, the vector of sensitivities 
^f^x^k) is presented at the output channels (1120, 1122.. .1124) of the filter unit (24) 
and cycled in reverse through the filter structure. This reverse operation is indexed by 
/ < A: . At each iteration / , the resulting sensitivity Qj^m"^ (/) is output at the input channel 
(1110) of the filter unit (24). For / < A:the external input at the output channels (1120, 
1122... 1124) is in practice zero vector since Q^x^ (0=0- However, the filter unit (24) 
itself is not constrained to operate under this assumption. 

hi FIG, 7i, the reverse operation of a delay (1130) is represented by q which is the 
unit delay in the reverse time direction since the index / is decreasing at each iteration. 

The reverse operation of a matrix operation (1132, 1134) or a vector operation 
(1136) is represented mathematically as the transpose of the forward operation. The 
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physical justification for this is shown in FIG, 14 which shows the individual channels 
represented by a 3x2-matrix operation which in forward operation maps two input 
channels to three output channels, and in reverse operation maps three input channels to 
two output channels. 

V.2.4 REVERSE RUNTIME OPERATION OF A DELAY UNIT 

The reverse operation of a delay unit (22) corresponds to a delay in the reverse 
sequencing: 



V.2.5 REVERSE RUNTIME OPERATION OF A PREPROCESSING UNIT 

The reverse operation of a preprocessing unit (20) is to scale data received at its 
output node using the inverse of the feedforward scaling shown in equation (1): 



V.3 CONFIGURATION MODE 

The predictive device is configured, in the preferred embodiment, using training 
data collected from the process. However, a process engineer can override any automated 
configuration settings. The training data set should represent one or more data sets which 
have been collected at the same base-time sample rate that will be used by the external 
device controller to present data to the predictive device in prediction mode. Each set of 
data should represent a contiguous sequence of representative. 

In order to allow operator approval or override of the configuration settings, the 
training of the predictive device is done in stages, each stage representing a major 
component of the predictive device. 



(15) 



Q,w^(/) = -Q,u(/) 



(16) 
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V.3.1 CONFIGURING THE PREPROCESSING AND POSTPROCESSING UNITS 

The scale and offset of a preprocessing or postprocessing unit is determined firom 
the desire to map the minimum Emin and maximum Emax of the corresponding variable's 
engineering units to the minimum Mmm and maximum Nmax of the normalized units: 



N -N ■ 



^rnax^^mn (17) 

F N -E N 

_ ^tmx-'^min -^min^ ^ max 

F -F 

The preferred normaUzed units have iV„,„=-l, N„ax=l. The engineering units may be 
different for each input variable, leading to a different scale and offset for each 
preprocessing/postprocessing unit. 



V.3.2 CONFIGURING A DELAY UNIT 
m 10 The configuration of a delay unit (22) is not a central aspect of this appUcation. 

FIG. 12 shows a simple advisory procedure for suggesting delay times. A process 
engineer can override these advisory settings. In this procedure d^m and </max are user 
settable limits for the delay time and the procedure calculates a delay time d such that 



d^^d<d^ 



15 V.3.3 CONFIGURING A FILTER UNIT 

A practical means of configuring a filter unit (24) is an essential aspect of this 
invention. The preferred method of configuration is initialized using the simpUfied filter 
structure shown in FIG. 13 in which all subfilters are first order and decoupled. This is 
the structure used in (Graettinger, et al, 1994). It is important to note that this structure is 

20 used for initialization of the configuration procedure and does not represent the final 
suggested filter configuration. 

Step 1 
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The operator specifies an appropriate dominant time constant 7} associated with 
each input variable. This can be specified firom engineering knowledge or through an 
automated approach such as Frequency Analysis or a Back Propagation Through Time 
algorithm. The value of the initid time constant is not critical the proposed configuration 
method automatically searches the dominant time range for the best values. 

Step 2 

For each input, initialize the filter structure in FIG. 13 using a high order system 
where a number of first order filters are created around the given dominant time constant 
(dominant frequency, dominant dynamics). For example, a fifth order system can be 
created using: 

A/ 

A,3=e'^ (18) 

In this simple filter structure, each subfilter (1302, 1304, 1306) yields a corresponding 
single state (1312, 1314, 1316) which is decoupled firom the other subfilter states. This 
initial filter structure represents the equation 

x(A:)=Ax(^-l) + Bu'(^) (19) 

which has a simplified diagonal block structure of the form 
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B = 



, 0 
A, 
0 
0 

b, 0 
0 

0 0 
0 0 



0 
0 
0 



0 
0 



0 
0 



0 

0 



where 







A, 


0 


0 




0 


0 


_ 0 


0 


"1-A," 


1-A,, 


_1-A,,_ 



0 
0 



(20) 



(21) 



b. = 



Step 3 

Map the contiguous input training data through the delay units (22) and filter 
structure (24) to obtain a set of training state vectors {x(^)jA; = 1, • • • , j} . Then find a 

vector c that provides the best linear mapping of the states to the corresponding target 
outputs {y{k]^k = 1, • ■ • , r} . One way of doing this is to use the Partial Least Squares 
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method that is well known to those skilled in the art. This results in a multi-input, single- 
output (MISO) state space system {A,b,c^ } in which equations (19), (20), and (21) are 
supplemented by the equation: 

y{k)=:c'x (22) 



where 



















c = 

















(23) 



Step 4 

Balance each subsystem {Apb,.,cf }of the MISO block diagonal system based on 

controllability & observability theory. The balancing procedure allows order reduction of 
10 a state space system by transforming the states so that the controllability and observability 
properties of the original system are substantially concentrated in the first part of the state 
vector. 

For each input variable, indexed by z, perform the balancing procedure on the sub- 
system {A.,b,.,cf }. Balancing of a linear state space system is a method of reduction well 

15 known to those skilled in the art. Other methods of model reduction, such as Hankel 
reduction, can be substituted. A summary of the balancing method is now given. 

For each sub-system {A.,b.,cf }, compute the controllability and observability 
Gramians P^. > 0, Q^. > 0 that satisfy the equations: 



A,p,Af-p,=-b.b; 

AfQ,A,-Q,=-c,cf 
20 Find a matrix Ri, using the Cholesky factorization method, such that 
P. = Rf R. 



(24) 



(25) 
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Using the singular value decomposition method, diagonahze to obtain the following 
decomposition: 



R.Q Rf =U S^U^ 

l^l I I I I 



Define 



I * I II 



(26) 



(27) 



then 

T,P,T/-=(t;^)"'q,T:'=S, (28) 
and the balanced subsystem is obtained through a similarity transform on the states as: 

A, =i;.A,i;-',b, =T,b,,c[ =c[t;-' (29) 

steps 

Using balanced subsystems find out dominant time constant for each input by 
reducing each balanced model to a first order model. This is done by considering the 
dynamics of all but the first state of each input's filter unit (24) to have reached steady 
state. This leads to: 



At 



where 



' ln(a,) 



«i=^/n+aL(l-A,-22)'a 



(30) 



i21 



(31) 



and 



(32) 



Check the convergence of the dominant time constant estimation: 
If 
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L^l^ia^^ <s ' (33) 

or the number of iterations has exceeded the maximum allowable, go to step 6. 
Otherwise, return to step 2. The maximum number of iterations and s are parameters of 
the training method. 

5 Step 6 

Once an accurate estimate of the dominant time constant is available for each 
input variable, the eigenvalues controllability gramian F 

(equivalently the observability gramian) are calculated; these are always positive and real 
because the controllability gramian is positive definite. The final order Si of each filter 
Qi 10 unit (24) is then calculated such that 

o 

m (34) 



t 11 



where 0 is parameter of the training method and is a value less than 1, a good practical 
value being 0.95. This order represents the total number of states of an individual filter 
unit (24). 

1 5 After determining the model order, truncate the A . matrix so that just the first Si 

states are used; this truncation is done by selecting the upper left SfX S,- submatrix of A . . 

Then calculate the S, eigenvalues of the truncated A . matrix {a ,.^5 = 1, • • ^ ] . Now 

configure each filter unit (24) using the preferred first and second order subfilter 
configurations with the preferred couplings as shown in FIG 5 through FIG. 8. Use a first 
20 order filter for each real eigenvalue. Use a second order filter for each pair of complex 
eigenvalues {/L, /I } , where, in FIG. 7 (equation 7) or FIG. 8 (equation 8): 
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«u=^+5 (35) 

The preferred ordering of these subfilter units is according to time-constant, with the 
fastest unit being the primary subfilter. 

Another favored approach is to perform model reduction by initializing with 
5 Laguerre type filter units as described in section V.4.2, rather than the simple diagonal 
filter structure of FIG. 13. Sufficient quantity of Laguerre type filter units span the fiill 
range of dynamics in the process, and thus the iterative process described above is not 
needed. In fact a non-linear model reduction can be achieved by performing a linear 
model reduction on the linear system whose states are defmed by the Laguerre filters and 
10 whose outputs are defined by pre-transformed values at the hidden layer of the neural net: 

?* V.3.4 CONFIGURING THE NON-LINEAR ANALYZER 

o 

Q The configuration of the non-linear analyzer (26) is not a central aspect of this 

ijl appUcation. The non-linear analyzer (26) is trained to optimally map the outputs of the 

m 

15 filter units (24) to the corresponding target output. Training of a neural net is described in 
itt detail in (Bishop, 95) for example. In one embodiment, the non-linear analyzer is 

M replaced by apparatus for a constrained non-linear approximator disclosed in U,S. Patent 

AppUcation No. 09/892,586 and herein incorporated by reference. In particular, page 33, 
lines 26-29, as filed in Aplication No. 09/392,586 describes the additional calculations 
20 needed when the inputs to the non-linear approximator are filtered states. 

V.4 UNIVERSALITY OF THE PREDICTION DEVICE 

The predictive device is shown, in this section, to be able to approximate any time 
invariant, causal, fading memory system (defined below). In order to prove this, some 
precise notation and definitions will be needed. 



■at, 
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V,4.1 NOTATION AND DEFINITIONS FOR UNIVERSALITY PROOF 

Let Z denote the integers, the non-negative integers and Z_ the non-positive 
integers respectively. A variable u represents a vector or a sequence in accordance with 
the context, while u(A:) represents a value of the sequence at the particular time L 

For any positive integer p>Oy denotes the normed linear space of real N- 
vectors (viewed as column vectors) with norm juj = maXj^.<^ j . Matrices are denoted in 

uppercase bold. Functions are denoted in italic lowercase if they are scalars and in bold if 
they are vector valued. 

Let /;(z) (respectively /;(Z J and 1^{Z_)), be the space of bounded - 
valued sequences defmed on Z (respectively Z+ and Z_ ) with the norm: 

IHL=sup,,z|u(^)| 

For every decreasing sequence w : Z^ -> (0,l], lim w{k) = 0 define the 

following weighted norm: 

A function F : (Z_ ) -> R is called 2l functional on (Z_ ) , and a function 
3 : (Z. ) -> (z) is called an operator. As a notational simplification the parentheses 
around the arguments of functional and operators are usually dropped; for example, Fn 
rather than f[u] and 3u(^) rather than 3[u](A:) . 

Two specific operators are important. The delay operator defined by 

eX^) = u(yt-^/) 
and the truncation operator defined by 

^ ^ [0 k>0 
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The following definitions make precise the terms used to characterize the class of 
systems approximated by the predictive device. 

Time invariant : An operator 3 is time-invariant if Q^3^ 3(2^ \/d eZ. 
Causality : 3 is causal if u(/) = v(/)V/ <k => 3u(A:) = 3v(A:) . 

5 Fadins Memory : 3 : (z) -> /"^ (z) has fading memory on a subset 

K_ c /JJ (Z„ ) if there is a decreasing sequence w : Z^ {0,\\ lim w{k) = 0 , such that for 
each u, V € K_ and given £• > 0 there is a > 0 such that 

\\uik)-v{k}l<s => |3u(0)-3v(0j<^ 

1^ Every sequence u in (Z_ )can be associated with a causal extension sequence 



10 in/; (Z) defined as: 



iri "^^^"lu(0) )t>0 

m 



and each time invariant causal operator 3 can be associated with a functional F on 
/;(zj defined by 



5 Ai = 3u,(0) 

15 The operator 3 can be recovered from its associated functional F via 

:5n{k)^FPQ''n (36) 

Then, 3 is continuous if and only if F is, so the above equations establish a one to one 
correspondence between time invariant causal continuous operators and functional F on 
/Jj(Z_). In the next the definition of the Laguerre system is given. These can be 

20 configured in the general filter structure of FIG. 4 but also have important theoretical 
properties. 



1086.1007-005 



-26- 



V,4.2 LAGUERRE SYSTEMS 

The set of the Laguerre systems is defined in the complex z-transform plane as: 



where: 

L\ (z) : is the Z transform of {k), the ^-th order system for the Mh input, 
: is the i-th input generating pole, such that \a.\ < 1 . This pole is selected as 

a^=l , where 2]. is the dominant time constant for the i-th input variable. 

: is the time delay associated with the i-ih input variable. 



The whole set of Laguerre systems can be expressed in a state space form that 

shows a decoupled input form and therefore can be mapped to the general filter structure 
in FIG 4. Each filter unit (24) is configured as a single structured {A^B^ } subfilter. The 



The key point here is that the representation is decoupled by input. Balancing can 
be done to decrease the order of the Laguerre systems, and similarity transforms can be 
done on the Laguerre filters in order to simplify the configuration to utilize the preferred 
subfilter units. Similarity transformations do not affect the accuracy of the representation 
and so proving that the use of Laguerre filters decoupled by input approximate any time 
invariant, causal, fading memory system is equivalent to proving the preferred subfilter 
structure can approximate any such system. The balancing is a practical mechanism to 
reduce order without degrading performance. 





structure of A,, is a lower triangular matrix, and b,- = [l 0 - • • o]^ . 
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V.4.3 PROOF OF APPROXIMATION ABILITY OF LAGUERRE SYSTEMS 
First some preliminary results are stated: 

Stone-Weierstrass Theorem (BovdJ985). 

Suppose E is a compact metric space and G a set of continuous functionals on E 
5 that separates points, that is for any distinct u, v e E there is a G e G such that 
Gu^Gy, Then for any continuous functional F on E and given > 0 , there are 

functionals, [Gl.-'Gl^.-'.G^^ .--GsJqG , S^^S- and a polynomial p:R^ -^R, 
such that for all u g E 

10 The reason for the group indexing, which is not necessary for a general statement of the 
Stone-Weierstrass theorem, will become apparent in Lemma 2 when each block with a 
Laguerre operator. In addition, three lemmas are necessary before the theorem can be 
proved. 

Lemma 1 : AT. = {u e (z_ )[o < ||u|| < |, is compact with the ||-||^ norm. 
15 Proof, Let u^^^ be any sequence in . We will find a u^^^ 6 K_ and a subsequence of 

u^^' converging in the IIIL norm to u^^^ It is well know that is not compact in 
(Z_ ) with the usual supremum norm |||^ (Kolmogorov, 1 980). For each /, let be 
^_[-^0] the restriction of to [-/,0], iir^[-/,0] is uniformly bounded by c, and is 
composed of a finite set of values, hence compact in /^[-/,0], Since A!^^[-/,0] is compact 
20 for every /, we can find a subsequence u^^'"^ of u^^^ and a u^^^ e ir_[-/,0] along which 
^{pm) converges: 

sup |u^^'" ' (k) -n^'\ki^O as m^oo (37) 
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Now, let £• > 0 . Since w{k) ^ 0 as A: oo , we can find /Wq > Oa such that ^m^) < s/c 
Since u^^^^^u^^^ e we have that 

sup U^"'« 

Now from equation (37) we can find such that 



1 " 



sup 

-mQ<k<0 



u^^'"^(A:)-" u^^^(^)j <£ for m>m^ 



(39) 



so by equation (38) and equation (39) we can conclude that 

< s for m>m^ 



|L(/^J^u(o) 



which proves that is compact. 

Lemma 2. The set of ftinctional [gI jassociated to the discrete Laguerre Operators are 
continuous with respect to norm, that is, given any ^ > 0 there exists S >0 such that 
llu -\\\ <S => 



Proof, Consider the functional (•) associated with the Laguerre operator I^s 
Given ^ > 0, chose S >0 such that: 



(40) 



1 5 This is possible due to the continxiity of the one dimensional Laguerre operators with 
respect to the weighted norm as shown in (Sentoni et al, 1996). Therefore, from equation 
(40) and the definition of the functionals 



|u-V <J=>M,. -V; <5=> 



g;u-g:v = 



g>,-g;v, <e 



(41) 



which proves Lemma 2 
20 Lemma3. The {g^} separate points in /^(Z_), that is, for any distinct u,ve/^(Z_) 



there is a G; g G such thatG> ^ G> , 
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Proof. Suppose u, v € (Z_ ) are equal except for the i-ih component. Then 



g;u ^ g;v o g>, ^ g;v, 



(42) 



by the definition of flie functionals. It is known from one dimensional theory (Sentoni et 
al, 1996) that for any distinct w-, v. e /""(zj there is a G[ such that G>. ^ G^v. ; this 
result together with equation (42) proves Lemma 3. 
Approximation Theorem 

Now given £>0, Lemmas 1, 2, 3 together with the Stone- Weierstrass theorem 
imply that given any continuous functional Fon K^, there is a polynomial p:R^ R , 
such that for all u e ^_ 



Because the Laguerre systems are continuous and acting on a bounded space, the G^u are 

bounded real intervals on so the polynomial p can be replaced by any static model that 
acts as a universal approximator on a bounded input space, for example, a neural net. In 
other words (43) can be replaced by 



A time invariant causal operator 3 can be recovered from its associated functional 
through equation (36) as 

^n{k) = FPQ''n 
Now let u G if and ^ g Z , so PQ'^n e K_ , hence 




(43) 



|Fu-iw(G;ur-,Giu,--sG,V...,G,>]<^ 



(44) 



\FPQ''u-NN{GlPQ''u,-',GlPQ''n,'-,^^^ 



<€ 



Since the last equation is true for alU g Z , we conclude that for all vlgK^ 



3u-3u 
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In other words, it is possible to approximate any nonlinear discrete time invariant 
operator having fading memory on K , with a finite set of discrete Laguerre systems 
followed by a single hidden layer neural net. This completes the proof 

V.5 EQUIVALENTS 

Although the foregoing details refer to particular preferred embodiments of the 
invention, it should be understood that the invention is not limited to these details. 

Substitutions and alterations, which will occur to those of ordinary skill in the art, can be 
made to the detailed embodiments without departing from the spirit of the invention. 
These modifications are intended to be within the scope of the present invention. 



